How’s that done?
library(tidyverse)
library(readxl)
library(janitor)
library(DT)
CPI22 <- readxl::read_excel(path="data/CPI2022_GlobalResultsTrends.xlsx", sheet=1, skip=2) %>% clean_names()
datatable(CPI22)Robert W. Walker
March 3, 2023
Last updated: 2023-03-17 09:34:01
Timezone: America/Los_Angeles
Transparency International provides a wealth of interesting data; I want to work with their Corruption Perceptions Index. The data can be obtained in an Excel spreadsheet. Here’s a brief shot of the file. The main object of interest, throughout, is the cpi_score – the corruption perception index.

These data have a first two rows that will need to be skipped and the names are terrible but we can use janitor’s clean_names to take care of that. The other thing to notice is three sheets. The second sheet will need some tidying and the third sheet is not all that interesting, to me. Let’s import the first one.
We also have the time series data. Let’s first import them.
These data require some tidying with pivot_longer; we will want to grab a cpi_score, rank, sources, and standard_error for each year that we have data. It is worth noting that the ranks only go back to 2017. There are harder and easier ways to do this. I wrote a quick function to take two inputs and then pivot each of the four series separately.
cleaner <- function(data, string) {
# Start with the data
data %>%
# Use the iso3 as ID and keep everything that starts with string
select(iso3, starts_with(string)) %>%
# pivot those variables except iso3
pivot_longer(.,
cols=-iso3,
# names_prefix needs to remove string_
names_prefix = paste0(string,"_",sep=""),
# make what's left of the names the year -- it will be a four digit year
names_to = "year",
# make the values named string
values_to=string)
}
CPI.TS.Tidy <- cleaner(CPI.Time,"cpi_score")
Sources.TS.Tidy <- cleaner(CPI.Time,"sources")
StdErr.TS.Tidy <- cleaner(CPI.Time,"standard_error")
Rank.TS.Tidy <- cleaner(CPI.Time, "rank")Now I can join them back together.
The third sheet is a set of statistically significant changes that I do not so much care about.
| Name | Piped data |
| Number of rows | 1991 |
| Number of columns | 6 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 5 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| iso3 | 0 | 1 | 3 | 3 | 0 | 181 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| year | 0 | 1.00 | 2017.00 | 3.16 | 2012.00 | 2014.00 | 2017.00 | 2020.00 | 2022.00 | ▇▅▅▅▅ |
| cpi_score | 42 | 0.98 | 43.03 | 19.27 | 8.00 | 29.00 | 38.00 | 56.00 | 92.00 | ▃▇▅▂▂ |
| sources | 42 | 0.98 | 6.72 | 1.84 | 3.00 | 5.00 | 7.00 | 8.00 | 10.00 | ▃▂▇▅▃ |
| standard_error | 42 | 0.98 | 2.89 | 1.55 | 0.41 | 1.85 | 2.51 | 3.49 | 12.81 | ▇▅▁▁▁ |
| rank | 911 | 0.54 | 89.04 | 51.63 | 1.00 | 45.00 | 87.00 | 132.50 | 180.00 | ▇▇▇▇▇ |
The sf package has special merge methods that I will deploy to combine the two bits of data.
Now to draw a map.
# create world map using ggplot() function
Map.Data <- Map.Data %>% mutate(tooltip = paste0(sovereignt,"<br>",year,"<br>CPI: ",cpi_score, sep=""))
Map.Res <- Map.Data %>%
dplyr::filter(year==2022L) %>%
ggplot(.) +
geom_sf(aes(fill=cpi_score, text=tooltip), size=0.1, alpha=0.8) +
scale_fill_viridis_c() +
theme_void() +
labs(title="Perceived Corruption around the World in 2022",
caption="Data from Transparency International",
fill = "CPI") + theme(legend.position="bottom")
Map.Res
This is enough to wrap as a page for a shiny app. The year selector and the two maps.
I want a second page that will only display the 2022 results in CPI22. There are tons of measures there, many with a lot of missing data, but those are only for one year. The selector here is going to be the series to plot.